Compiling Data Dependent Control Flow on SIMD GPUs
نویسنده
چکیده
Current Graphic Processing Units (GPUs) (circa. 2003/2004) have programmable vertex and fragment units. Often these units are implemented as SIMD processors employing parallel pipelines. Data dependent conditional execution on SIMD architectures implemented using processor idling is inefficient. I propose a multi-pass approach based on conditional streams which allows dynamic load balancing of the fragment units of the GPU and better theoretical performance on programs using data dependent conditionals and loops. The proposed system can be used to turn the fragment unit of a SIMD GPU into a stream processor with data dependent control flow.
منابع مشابه
Dynamic Warp Formation: Exploiting Thread Scheduling for Efficient MIMD Control Flow on SIMD Graphics Hardware
Recent advances in graphics processing units (GPUs) have resulted in massively parallel hardware that is easily programmable and widely available in commodity desktop computer systems. GPUs typically use single-instruction, multiple-data (SIMD) pipelines to achieve high performance with minimal overhead for control hardware. Scalar threads running the same computing kernel are grouped together ...
متن کاملStack-less SIMT reconvergence at low cost
Parallel architectures following the SIMT model such as GPUs benefit from application regularity by issuing concurrent threads running in lockstep on SIMD units. As threads take different paths across the control-flow graph, lockstep execution is partially lost, and must be regained whenever possible in order to maximize the occupancy of SIMD units. In this paper, we propose a technique to hand...
متن کاملCharacterization and Transformation of Unstructured Control Flow in GPU Applications
Hardware and compiler techniques for mapping data-parallel programs with divergent control flow to SIMD architectures have recently enabled the emergence of new GPGPU programming models such as CUDA and OpenCL. Although this technology is widely used, commodity GPUs use different schemes to implement it, and the performance limitations of these different schemes under real workloads are not wel...
متن کاملControl Flow Emulation on Tiled SIMD Architectures
Heterogeneous multi-core and streaming architectures such as the GPU, Cell, ClearSpeed, and Imagine processors have better power/ performance ratios and memory bandwidth than traditional architectures. These types of processors are increasingly being used to accelerate compute-intensive applications. Their performance advantage is achieved by using multiple SIMD processor cores but limiting the...
متن کاملData Management and Control-Flow Aspects of an SIMD/SPMD Parallel Language/Compiler
Features of an explicitly parallel programming language targeted for reconfigurable parallel processing systems, where the machine's -1processing elements (PE's) are capable of operating in both the SIMD and SPMD modes of parallelism, are described. The SPMD (Single Program-Multiple Data) mode of parallelism is a subset of the MIMD mode where all processors execute the same program. By providin...
متن کامل